Phylogeographic patterns of the yellow fever virus around the metropolitan region of São Paulo, Brazil, 2016–2019

From 2016 to 2019, the largest outbreak caused by the Yellow Fever virus (YFV) in the 21st century in the Americas occurred in southeastern Brazil. A sylvatic cycle of transmission was reported near densely populated areas, such as the large metropolitan area of the city of São Paulo. Here, we describe the origin, spread, and movement of the YFV throughout the state of São Paulo. Whole-genome sequences were obtained from tissues of two patients who died due to severe yellow fever, during 2018–2019. Molecular analysis indicated that all analyzed tissues were positive for YFV RNA, with the liver being the organ with the highest amount of viral RNA. Sequence analysis indicates that genomes belonged to the South American genotype I and were grouped in the epidemic clade II, which includes sequences from the states of Goiás, Minas Gerais, and São Paulo of previous years. The analysis of viral dispersion indicates that the outbreak originated in Goiás at the end of 2014 and reached the state of São Paulo through the state of Minas Gerais after 2016. When the virus reached near the urban area, it spread towards both the east and south regions of the state, not establishing an urban transmission cycle in the metropolitan region of São Paulo. The virus that moved towards the east met with YFV coming from the south of the state of Rio de Janeiro, and the YFV that was carried to the south reached the Brazilian states located in the south region of the country.

a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 described the whole-genome sequencing of two yellow fever cases that occurred in late 2018 and early 2019, in two different locations of the state of São Paulo. The main objective of the present study was to analyze and compare YFV sequences from recent cases to sequences available on GenBank since 2015 (all from clade II of the outbreak) in the states of Goiás, Minas Gerais, and São Paulo, to better estimate the routes of introduction, dispersion, and escape of the virus during the entire period of the outbreak.

Ethical statement
The human autopsies reported in this study were performed after obtaining the written consent of the family members and following the protocol approved by the research ethics committee of the Clinical Hospital of the University of São Paulo School of Medicine (HCFMUSP) (CAAE protocol number: 18781813.2.0000.0068). All the methods were performed following the relevant guidelines and regulations of the ethics committee of the HCFMUSP following the approved protocol. All laboratory procedures listed below were performed in a biosafety level 2 laboratory (BSL-2), under the Brazilian standards of the Ministry of Health for Biological Agents Risk Classification [25].

Patients and samples
From November/2018 to February/2019, we investigated three patients suspected of yellow fever, all of which were later diagnosed as fatal cases of YFV infection. The suspected case definition of yellow fever was established by the Brazilian Ministry of Health in consonance with the Health Department of the state of São Paulo. Suspected yellow fever cases are defined as illness presenting sudden onset high fever associated with jaundice and/or haemorrhage who had lived or had visited areas with evidence of YFV circulation (i.e., epizootics in NHP or isolation of YFV from vectors), regardless of the vaccine status, during the preceding 15 days.
Confirmed cases had compatible clinical presentation and laboratory confirmation by at least one of the following methods: (i) Seropositive by Immunoglobulin M antibody capture enzyme-linked immunosorbent assay (MAC-ELISA); (ii) detection of YFV RNA by Real-time quantitative reverse transcription-polymerase chain reaction (RT-qPCR) in blood samples; (iii) virus isolation; (iv) histopathology compatible with yellow fever hepatitis with detectable antigen in tissues by immunohistochemistry technique. All cases received the definitive laboratory diagnosis of yellow fever by the Adolfo Lutz Institute, the state Reference Laboratory. Previous exposure or co-infection by Hepatitis A virus (HVA), B (HBV), C (HCV), Cytomegalovirus (CMV), Herpes virus (HSV), DENV, CHIKV, Human Immunodeficiency virus type 1 (HIV-1), Leptospira, and other non-infectious diseases aetiologies for acute hepatitis were accessed and cases were excluded following clinical diagnostic methods.
Epidemiological data, including demographic aspects, clinical data, including pre-existing medical conditions, clinical signs and symptoms, and in-hospital follow-up until death, as well as laboratory data, including diagnostic tests, were collected from all patients.

Autopsy protocol and tissue processing
The Service of Verification of Deaths of the Capital-USP (SVOC-USP) investigated deaths due to yellow fever from November/2018 to February/2019. Autopsies were performed following the Letulle technique, where all the organs were removed en masse (one block), requiring dissection organ by organ to examine them individually. Briefly, the dissection to all the patients were performed in the following organs: (i) heart; (ii) lung; (iii) brain; (iv) kidney; (v) spleen; (vi) pancreas; (vii) liver; and (viii) testicle.

Molecular characterization
Nucleic acid extraction from all collected tissues using 50 mg of the material was performed using the TRIzol reagent (Life Technologies, Carlsbad, CA, USA) and carried out according to the manufacturer's instructions. Molecular detection of YFV was performed by AgPath-ID One-Step RT-PCR Reagents (Ambion, Austin, TX, USA) with specific primers/probe, as previously described [26]. To identify cases of rare adverse events from yellow fever vaccination we used specific primers/probe for the detection of the attenuated viral strain 17D, the vaccine virus [27]. To quantify the genomic viral concentration in each tissue, we used a doublestranded DNA fragment containing the amplification region (Exxtend, SP, Brazil), which from several known molecules, was diluted and used to estimate the viral RNA concentration by linear regression. RT-qPCR reactions were run in ABI7500 equipment (Thermo Fisher Scientific, Waltham, MA, USA). To visualize the spatial location of the positive patients, we used R software, geobr package, (MIT license https://ipeagit.github.io/geobr/) [28] to create the maps.

Sequencing and viral genome assembly
Based on the RNA viral concentration and in our previous protocol to obtain viral complete genomes from human tissues [18,29], positive samples had the total RNA re-extracted by the abovementioned procedure. RNA samples were then purified and concentrated with DNase I and RNA Clean and Concentrator -5 kit (Zymo Research, Irvine, CA, USA). The paired-end RNA libraries were constructed and validated using the TruSeq Stranded Total RNA HT sample prep kit (Illumina, San Diego, CA, USA). Sequencing was done at the Core Facility for Scientific Research-University of São Paulo (CEFAP-USP/GENIAL) using the Illumina MiSeq platform. Each sample was barcoded individually, which allowed the separation of reads for each patient. Short unpaired reads and low-quality bases and reads were removed using Trimmomatic version 0.39 (LEADING:20 TRAILING:20 SLIDINGWINDOW:4:25 MINLEN:36) [30]. Consensus genomes were assembled with paired-end reads using Bowtie2 version 2.4.1 using default parameters [31]. To access the coverage and generate the consensus genome (Fig A in S1 Text), we used UGENE v.33.0 [32].

Data sets
All the complete genomic sequences available for YFV (until August 20, 2020), and their associated information, such as location and year of detection were recovered from the National Center for Biotechnology Information (NCBI) (https://www.ncbi.nlm.nih.gov/genbank/) website in GenBank format. All sequences were converted into FASTA format, and to avoid duplicated and low-quality sequences with many unknown nucleotides (N), we set an exclusion threshold for sequences that had more than 1% of "N" concerning their total length (https:// biopython.org/wiki/Sequence_Cleaner). These sequences were grouped with the sequences isolated from patients in São Paulo 2018-2019, the FASTA sequences were aligned using Clustal Omega [33] and recombinant sequences were screened using all algorithms implemented in RDP5 Beta 5.05 program (RDP, GENECONV, MaxChi, BootScan, and Siscan) using the standard settings [34]. The alignment of recombinant free sequences was manually inspected and edited using the program AliView v.1.18 [35].
To determine which genotypes and lineages circulated in the state of São Paulo during the outbreak, two datasets were constructed: (i) the first (dataset-1) included all the sequences deposited at GenBank (before August-2020), with the sequences determined in our study (n = 264 sequences) (Table A in S1 Text); (ii) the second (dataset-2), based in the phylogenetic analysis of the dataset-1, included all sequences deposited at GenBank (before August-2020) belonging to the genotype SA-I, clade II of the outbreak (sequences isolated from 2015 in the states of Goiás, Minas Gerais, and São Paulo states) (n = 91 sequences) ( Table B in S1 Text), according to our previous publication [18].

Phylogenetic analysis
Phylogenetic inference was performed with all the curated datasets. The phylogenetic tree was reconstructed based on the full-length, curated coding sequences using the Maximum Likelihood (ML) method implemented in IQ-TREE 1.5.5 [36]. The robustness of the groupings observed was assessed using an ultrafast bootstrap approximation (UFboot) during 1,000 replicates. The ML tree was visualized and plotted using FigTree v. 1.4.3 [37]. Taxon labels for sequences used in this work had the format: genotype/accession number/local of isolation/ date of isolation.

Phylogeographic analysis
All the sequences characterized as clade II were used to reconstruct the viral origin and to trace the movement of the C-II sequences (dataset-2). We explored the temporal signal (i.e., molecular clock structure) and the quality of our data set using TempEst v.1.5.3 [38]. The spatiotemporal spread was reconstructed under a Bayesian framework implemented in BEAST v.1.10.4 [39], using the general time-reversible model incorporating invariant sites with gamma-distributed rate variation substitution model (GTR+I+G), as described by the Bayesian information criterion (BIC) using jModelTest 2.1.10 [40]. Based on previous estimates of evolutionary dynamics of related YFV [17,41], we tested uncorrelated relaxed molecular clocks assuming a log-normal distribution, in combination with non-parametric population growth models: (i) the standard Bayesian skyline plot (BSP; 10 groups) [42], (ii) the Bayesian skyride plot [43], and (iii) the Bayesian skygrid model [42] (Tables C and D in S1 Text). Phylogeography patterns and parameters were estimated by running a Markov Chain Monte Carlo (MCMC) for 200 million states and sampling every 200.000 states with 10% burn-in. The effective sample size (ESS) and convergence were examined with Tracer v.1.7.1 [44]. The maximum clade credibility (MCC) tree was also visualized with FigTree v. 1.4.3 [37]. To calculate the log marginal likelihood for demographic model selection, we used the path sampling (PS) [45,46] and the stepping-stone (SS) sampling [47] approaches by running 100 path steps of 1 million iterations each. The spatiotemporal spread was visualised with spreaD3 v.0.9.7.1 [48].

Data sequences
The new sequences here characterized were deposited in GenBank under the accession numbers MW308134 and MW308135.

YFV epidemiological surveillance in São Paulo, 2016-2019
YFV emerged in the state of São Paulo and was first recognized in early 2016, through the detection of two human cases. The monitoring of epizootics revealed an ongoing occurrence of yellow fever in NHP. In 2016 and 2017, the number of NHP cases reported was higher than in humans. Since the end of 2017, YFV was reported in an extensive area where there was no recommendation for vaccination, mainly in the MRSP. In late 2018, a specific case was observed in the Atlantic Forest in the eastern region of the state, near the border with the state of Rio de Janeiro. In early 2019, some cases were observed in the south of the state, close to the border with the state of Paraná, and in the last two years of the epidemic (2018 and 2019), the number of human cases has suppressed the number of reported NHP cases (Fig 1A-1C).
From October 2018 to February 2019, three patients suspected of fatal yellow fever entered the SVOC-USP in the city of São Paulo. All patients had organs that tested positive for RT-qPCR. Two patients (patient 084 and patient 085) were confirmed positive for sylvatic YFV, and one of them tested positive for the vaccine strain YFV-17DD.

Clinical and demographic features associated with the sylvatic YF cases
The first case (patient 084) was a 26-year-old man, from the city of Cunha, a peasant, without any medical condition. He travelled one week before symptoms onset to Caraguatatuba, a region of Atlantic Forest, to work on a farm. He had no YF vaccination. His disease started on October 20 th , 2018, presenting asthenia, generalized myalgia, intense headache, fever, oliguria, nausea, vomits, abdominal pain, and developed icterus in the following days. He was transferred to São Paulo to the intensive care unit of the Infectology Institute "Emílio Ribas", on the 7 th day. The initial clinical hypothesis was: YF, leptospirosis, dengue, or Brazilian spotted fever. At admission, he was in a severe clinical condition, with torpor, bradycardia, and uremic, and was put under mechanical ventilation. During hospitalization fulminant hepatitis was diagnosed, with gastrointestinal haemorrhages, convulsions, and acute kidney injury, needing dialysis and multiple hemocomponents transfusion. He died on the 9 th day of disease onset. The diagnosis of YF was done by a positive anti-YFV IgM in serum (collected on the 7 th day); detectable YFV RNA in the blood by RT-PCR (7 th day), and autopsy findings. Based on the demographic history that preceded this patient's fatal condition, we decided to consider the probable site of infection as Caraguatatuba in the viral spread reconstruction.
The second case (patient 085) was a 58-year-old male, peasant, resident of Iporanga, close to areas with confirmed cases of YF epizootics and other human cases. He had a medical history of high blood pressure and was treated for prostate cancer. His symptoms started on January 15 th , 2019, with intense headache, fever, malaise, nausea, vomiting, and jaundice. The patient was transferred to IIER ICU with clinical suspicion of yellow fever. He evolved with fulminant hepatitis with coma and multiple organ dysfunction, treated with mechanical ventilation, hemocomponents transfusion, antibiotics, and other intensive measures. The patient died on the 5 th day of illness. The diagnosis of YF was done by a positive anti-YFV IgM in serum (collected on the 5th day); detectable YFV RNA in the blood by RT-PCR (4th day), and autopsy findings.
Both cases had mid-zonal hepatitis, with steatotic and apoptotic hepatocytes, sinusoidal congestion and Kupffer cells hyperplasia; diffuse gastrointestinal haemorrhages; pulmonary haemorrhages, lymphoid hypoplasia, and acute tubular necrosis. Liver tissues presented high levels of viral RNA when compared to other tissues (Fig 2).

Phylogenetic analysis
Based on our previous work that showed the efficiency in recovering quality viral YFV sequences from liver autopsy material of patients with clinical outcomes of death [18], we submitted the total RNA obtained from the patients' liver to total genomic RNA sequencing. In both, quality sequences were obtained, one with the complete coding region, and the other almost complete, having 20 contiguous bp that were not sequenced, that was assumed as "N" from the 6315-6334 site of the coding sequence (Fig A in S1 Text). However, we chose to keep this sequence in our analysis, since no nucleotide substitution in that region was observed in these positions in all clade II sequences (n = 91 sequences), which were assumed as possibly non-informative sites.

Phylogenetic characterization
The phylogenetic characterization of the two viruses sequenced here, compared to sequences previously characterized and deposited in the GenBank indicated that both sequences grouped within the South American genotype I, within the group of sequences associated with the outbreak identified in Brazil between 2016-2019, including sequences of midwest, southeast and northeast Brazil (Fig 3). The coding region of the two genomes sequenced here was grouped into a multi-Fasta file with another dataset of 89 non-redundant genomes recovered from GenBank (n = 91 sequences), previously filtered to exclude incomplete genomes, sequences with low quality, and sequences without information on location and date of collection. This alignment, with 91 sequences, was generated producing a file with 10.236 nucleotide sites, with 10.073 constant and invariant (constant or ambiguous constant) sites (98.41% of all sites). Among the variant sites, 60 sites were considered informative under parsimony and 202 sites had distinct site patterns. A root-to-tips analysis showed that the virus accumulated genetic diversity over time (r = 0.55) (Fig 4).

Phylogeographic characterization
The comparison between some of the different models of viral propagation and diffusion (Tables C and D in S1 Text) allowed us to estimate the pattern of viral dispersion with greater accuracy. According to the inferential results (Fig 5), the outbreak began in the state of Goiás

Discussion
Since the last urban cases of yellow fever reported in the Americas in the early 1950s, YFV has been maintained in forested areas through sylvatic cycles of transmission. The largest yellow fever outbreak in the 21 st century in the Americas began in 2016 passing through all states of southeast Brazil reaching the south and northeast regions of the country [14,18,19,49]. To reach the state of Minas Gerais, the virus possibly was carried across the state of Goiás, located in the west-central region, as the bridge from its natural maintenance area, the Amazon basin. Goiás is located in a dry savannah area (cerrado), which is a biome physically linked to the Amazon and Atlantic Forest [49,50]. In recommended areas for YF vaccination, the virus circulated silently causing epizootics in NHP. In areas with low vaccination coverage, YFV affected human populations causing a high number of deaths from 2017 to 2018 [18,22].
In the MRSP, the virus began its circulation in 2017, reaching the region from the south Minas Gerais, moving slowly and causing infections in NHP and humans along with its spread [18,24]. Despite circulating in forested regions close to densely inhabited areas in the MRSP during 2017-2018, YFV only infected susceptible people who had reported exposure to forested environments [15,16,18]. In the following epidemiological period (2018-2019), no cases were reported in NHP or humans in the MRSP. This can be explained by two factors: (i) YFV collapsed many populations of NHP in the region and (ii) the human population adhered to mass vaccination during the outbreak. During 2018-2019, the YF cases were reported mainly in the Vale do Ribeira (south), but also in the east of the state of São Paulo. In the case of the patient who lived in the city of Cunha, he visited a neighboring city (Caraguatatuba) at a time compatible with the viral incubation period, which is about 3 to 6 days before the onset of symptoms [51,52], which is located in the same region in a distance of 76 km away.
Multiple organs were positive for the presence of YFV RNA, as we have shown recently for other cases [15,16,18]. The liver was the organ with a high viral load in both patients. These findings reinforce the well-established evidence that the liver tissue is a replication site of YFV and the notion of multiple tissue compartmentalization in severe cases of infection. In some severe and non-fatal cases, hepatic tissue infection is seen months after the clinical outcome of the disease, causing late hepatitis, with positive molecular detection and evidence of lesions in pathological analyses [53].
In experimental models using NHP, the genome and viral antigens are found in multiple organs with a greater amount of virus, when compared to serum, between one to 30 days postinfection. Interestingly, neutralizing antibodies are found after the seventh day of infection, in consonance with the reduction of viremia and subsequently reduction of virus levels in tissues [54]. Our findings shown in Fig 2 support a model for YFV infection in humans based on autopsy and molecular findings, characterized by hepatitis, nephropathy, coagulopathy, and haemorrhagic vasculopathy that can lead to a fatal condition [16].
The viral sequences recovered from both patients reported here are grouped within the South American genotype I, alongside sequences previously characterized in previous years in São Paulo and neighboring states [14,18,49]. Within the group of outbreak sequences, they were grouped within clade II, which are sequences corresponding to viral isolates that circulated in the states of São Paulo, Minas Gerais, and Goiás [18,49].
The pattern of viral spread observed in the transition from 2018 to 2019 suggested that the densely urbanized area of the MRSP acted as a barrier to viral dispersion, deflecting the virus to spread to the east and south of the state. When comparing the transmission network of the 2015-2019 outbreak with other well-noted outbreaks from the past, we noticed that a very similar pattern of viral movement was observed in Brazil during outbreaks between 1932 and 1942 [50,55,56]. During those outbreaks, YFV reached the state of São Paulo from the Amazon Basin, also through the state of Goiás. Possibly, the maintenance of the viral spread pattern is associated with biological variables associated with virus-host interaction, including vectors and NHP, and geo-environmental variables, such as altitude, annual rainfall, and temperature [57]. As occurred between 1932-1942, YFV reached the MRSP and spread to the east and south regions of the state. Two possible hypotheses can explain the recent virus siege of the MRSP: (i) when the first cases of YFV were identified in NHP and human populations close to urban areas in the MRSP, the Department of Health of the state of São Paulo started mass campaigns for vaccination in peri-urban regions, causing an immunological protection belt around the densely populated area of the MRSP, and (ii) since the first half of the 20 th century, outbreaks in urban environments in Brazil caused by YFV are not documented, and one of the hypothesis is the lack of vector competence of the current Aedes aegypti populations in Brazil. However, a recent study demonstrated that different populations of Aedes aegypti in Brazil are competent for urban transmission of YFV [58]. Thus, vaccination associated with the circulation of other closely related flaviviruses in urban areas, as well as other ecological and epidemiological factors is believed to be involved in the continuous absence of urban cycles of transmission of YFV in the Americas.
In 2019, the YFV circulation decreased considerably compared to previous years, but with the circulation of the virus associated with clade I in the state of Rio de Janeiro with few cases [59], and in São Paulo associated with the clade II, some cases were reported. After 2019, NHP and human cases of YFV were reported in south Brazil, including the states of Paraná, Santa Catarina, and the Rio Grande do Sul [60][61][62]. Considering the limited number of YFV sequences available for better evaluation of the movement of YFV in these states, our analysis suggests that YFV that arrived in the state of Paraná moved to the southern region of the state of São Paulo, as they are border states.
Supporting information S1 Text. Fig A in S1 Text. Combined coverage (normalized by the sample average) along the two sequenced Yellow Fever virus (YFV) genomes generated in this study. The genomic position reflects the genomic organization of the YFV, which is organized as follows: a single polyprotein cleaved into three structural, including capsid (C), membrane (M), and envelope (E), and seven non-structural (NS) proteins named NS1, NS2A, NS2B, NS3, NS4A, NS4B, and NS5. The polyprotein is flanked by the 5' and 3' ends, which are non-coding. Table A in S1 Text. Complete YFV polyprotein sequences used in the phylogenetic analysis (dataset-1) (n = 264). Table B in S1 Text. Complete YFV polyprotein sequences used in the phylogeographic analysis (dataset-2) (n = 91). Table C in S1 Text. Model comparison of the relaxed molecular clock and demographic growth models through path sampling (PS) and stepping stone (SS) methods. Bold numbers indicate the best fitting model.